1 Introduction

Formula 1 (F1)1 is the highest class of international motor sport racing event sanction and organised by the Fédération Internationale de Automobile (FIA). A Formula One season consists of a series of races called Grand Prix which are organised on race circuits all over the world. Every season a driver earns points and the sum of points at the last grand prix is used to determine the world champion for the year.

Since 1950s there has been immense technological advancement is the field of motor vehicles. Formula One constructors lead the design and innovations of better, faster and efficient automobile engineering products. Formula 1 cars are the fastest road cars to exist with the ability of having high cornering speeds achieved through generation of large amounts of aerodynamic downforce generated by the shape of car and the massive rear wing. Naturally driving a Formula One car is a physically challenging job which makes racing with the car an extreme competitive sport.

Formula 1 championship consists of 10 teams with 2 drivers each team. This means that those 20 drivers are the best of the best drivers in the world. With everyone possessing a car with very similar specifications, the winning depends on driving skills, the quality of car manufacturing and a bit of luck. With such high specification cars, the race is extremely competitive with lap times between drivers measured with an accuracy of a millisecond. The lap time difference between first 5 cars is usually within a few milliseconds, that’s how competitive F1 race gets.

This analysis is going to focus on one specific race, i.e., the Abu Dhabi Grand Prix in 20212 the last race of the season.The race at this circuit consisted of 58 laps. This was a very important race in the Formula 1 history because this 1 race decided the fate of 2 drivers.

Lewis Hamilton is a 7 time Formula 1 World Champion. He has won the world championship 7 consecutive times. If we wins this race then he will be the world champion for 8 consecutive times beating the world record of another legendary driver Michael Schumacher who also has 7 world championships under his belt.

Max Verstappen a very young driver in his 20s and has never won a world Championship before. But his amazing performance in this season meant he was the contender for the Championship battle. Coincidentally, both Lewis Hamilton and Max Verstappen had 369.5 points coming into the race. This meant that this race was the decider race for the championship battle. Whoever wins the race is going to be the next world champion.

1.1 Rationale

There was a controversy associated with the Abu Dhabi Grand Prix. One of the drivers crashed in the 55th lap of the race. Upto this point Lewis Hamilton was leading the race at 1st position and Max Verstappen was in the 2nd position for the entire race. When there’s a crash a safety car is deployed in Formula 1 racing. A safety car is car which drives ahead of all the drivers in a controlled pace while the race crew clears the track of crash debris. One of the important rule of safety car is no driver is allowed to overtake other driver while the safety car is out, as the race is at a halt.

Lewis Hamilton and Max Verstappen were so fast that they had caught up with cars still completing lap 56 while they were in 57th lap. But in this lap safety car was out so no one was allowed to overtake these lapped cars. So in lap 57 there was 3 cars in between Lewis Hamilton and Max Verstappen who were “outlapped” i.e., they were lagging 1 lap. The then race director Michael Massi allowed those 3 outlapped cars to overtake the safety car! This was clearly out of the rules and illegal move. But this helped Max Verstappen to gain places towards Hamilton.

Watch the final Lap here

Now the situation is extremely intense as just 1 lap of racing is left and both world championship contenders are in 1st and 2nd position. Max Verstappen overtakes Hamilton in the last corner and wins the Abu Dhabi Grand Prix 2021 winning the World Championship title for the first time.

This analysis is to determine that if those 3 cars were not allowed to overtake the safety car, was there a chance of Max Verstappen winning the race. Past performance is the best indicator to determine the chance of success and we will use the 2021 season statistics to evaulate both drivers performance.

1.2 Dataset

The Ergast Developer API3 is an experimental web service which provides a historical record of motor racing data for non-commercial purposes. This data set contains detailed information about every driver, every race, every lap of the race since 1950s. The dataset can be obtained at this webpage and this is the link to download the zip file of all the csv files. This dataset consists of 14 csv files which are interconnected by identifiers such as Race Id and Driver Id. This report will make use of 8 of those data files for the analysis.

2 Research Questions

  1. How was the performance of both the drivers prior to his race?
  2. Which driver has won most races at this circuit?
  3. How was the lap performance of both drivers in the race?
  4. Was Lewis Hamilton more likely to win if the lapped cars weren’t allowed to overtake the safety car?

3 Data Analysis

Reading all the csv files into their respective variables:

drivers <- read.csv("f1db_csv/drivers.csv")
results <- read.csv("f1db_csv/results.csv")
lap_times <- read.csv("f1db_csv/lap_times.csv")
circuits <- read.csv("f1db_csv/circuits.csv")
constructors <- read.csv("f1db_csv/constructors.csv")
qualifying <- read.csv("f1db_csv/qualifying.csv")
races <- read.csv("f1db_csv/races.csv")

This is the Abu Dhabi Grand Prix Race Circuit and it has been assigned a circuitId of 24

abudhabi_circuit <- circuits %>% 
  filter(location == "Abu Dhabi") %>% 
  select(circuitId, name, location, country)
abudhabi_circuit %>% knitr::kable()
circuitId name location country
24 Yas Marina Circuit Abu Dhabi UAE

All the races that took place at Abu Dhabi circuit.

abudhabi_races <- races %>% 
  filter(circuitId == abudhabi_circuit$circuitId)


Selecting required columns from the results data set and saving it to new variable new_results.

new_results <- results %>% 
  select(raceId,driverId, constructorId, positionText, points, laps, time, fastestLap, fastestLapTime, fastestLapSpeed)
new_results


Selecting the columns required from races and assigning to the variable new_races:

new_races <- races %>% 
  select(raceId, year, circuitId, name)
new_races


Intermediary variable to merge new_results and new_races by raceId.

test <- merge(x=new_results,y=new_races,by="raceId")
test %>% 
  filter(year==2021)


Using the above variables and created a variable with all information about Hamilton’s performance in the 2021 season

hamilton_performance <- test %>% 
  filter(year==2021, driverId==1)
hamilton_performance <- left_join(hamilton_performance, drivers, by="driverId")
hamilton_performance <- hamilton_performance %>% 
  select(driverId,code, forename, surname, positionText, points, laps, time, fastestLap, fastestLapTime, fastestLapSpeed, circuitId, raceId, year, name)
hamilton_performance %>% knitr::kable()
driverId code forename surname positionText points laps time fastestLap fastestLapTime fastestLapSpeed circuitId raceId year name
1 HAM Lewis Hamilton 1 25.0 57 1:24:28.471 50 1:25.084 227.633 78 1051 2021 Qatar Grand Prix
1 HAM Lewis Hamilton 1 25.0 56 1:32:03.897 44 1:34.015 207.235 3 1052 2021 Bahrain Grand Prix
1 HAM Lewis Hamilton 2 19.0 63 +22.000 60 1:16.702 230.403 21 1053 2021 Emilia Romagna Grand Prix
1 HAM Lewis Hamilton 1 25.0 66 1:34:31.421 47 1:20.933 206.971 75 1054 2021 Portuguese Grand Prix
1 HAM Lewis Hamilton 1 25.0 66 1:33:07.680 54 1:20.665 208.640 4 1055 2021 Spanish Grand Prix
1 HAM Lewis Hamilton 7 7.0 78 +1:08.231 69 1:12.909 164.769 6 1056 2021 Monaco Grand Prix
1 HAM Lewis Hamilton 15 0.0 51 +17.668 43 1:44.769 206.270 73 1057 2021 Azerbaijan Grand Prix
1 HAM Lewis Hamilton 2 19.0 71 +35.743 71 1:07.058 231.811 70 1058 2021 Styrian Grand Prix
1 HAM Lewis Hamilton 2 18.0 53 +2.904 44 1:37.410 215.903 34 1059 2021 French Grand Prix
1 HAM Lewis Hamilton 4 12.0 71 +46.452 55 1:08.126 228.177 70 1060 2021 Austrian Grand Prix
1 HAM Lewis Hamilton 1 25.0 52 1:58:23.284 45 1:29.699 236.430 9 1061 2021 British Grand Prix
1 HAM Lewis Hamilton 2 18.0 70 +2.736 49 1:18.715 200.363 11 1062 2021 Hungarian Grand Prix
1 HAM Lewis Hamilton 3 7.5 1 +2.601 13 1063 2021 Belgian Grand Prix
1 HAM Lewis Hamilton 2 19.0 72 +20.932 72 1:11.097 215.654 39 1064 2021 Dutch Grand Prix
1 HAM Lewis Hamilton R 0.0 25 3 1:25.870 242.864 14 1065 2021 Italian Grand Prix
1 HAM Lewis Hamilton 1 25.0 53 1:30:41.001 43 1:37.575 215.760 71 1066 2021 Russian Grand Prix
1 HAM Lewis Hamilton 5 10.0 58 +41.812 52 1:32.763 207.160 5 1067 2021 Turkish Grand Prix
1 HAM Lewis Hamilton 2 19.0 56 +1.333 41 1:38.485 201.521 69 1069 2021 United States Grand Prix
1 HAM Lewis Hamilton 2 18.0 71 +16.555 66 1:19.820 194.116 32 1070 2021 Mexico City Grand Prix
1 HAM Lewis Hamilton 1 25.0 71 1:32:22.851 46 1:11.982 215.503 18 1071 2021 São Paulo Grand Prix
1 HAM Lewis Hamilton 1 26.0 50 2:06:15.118 47 1:30.734 244.962 77 1072 2021 Saudi Arabian Grand Prix
1 HAM Lewis Hamilton 2 18.0 58 +2.256 43 1:26.615 219.495 24 1073 2021 Abu Dhabi Grand Prix

Determining which position Hamilton held how many times over the season.

hamilton_performance  %>% 
  group_by(positionText) %>% 
  summarise(count=n()) %>% 
  rename(Position = positionText , Races = count) %>% 
  arrange(as.numeric(Position)) %>% 
  knitr::kable()
Position Races
1 8
2 8
3 1
4 1
5 1
7 1
15 1
R 1

From the above table we can see that Hamilton has a very consistent performance. He finished 1st in 8 races and 2nd in 8 races. R stands for retired it mean that the car had some problem in the race and it was retired.


Now we’ll do the same thing for Verstappen:

verstappen_performance <- test %>% 
  filter(year==2021, driverId == 830)
verstappen_performance <- left_join(verstappen_performance, drivers, by="driverId")
verstappen_performance <-verstappen_performance %>% 
  select(driverId,code, forename, surname, positionText, points, laps, time, fastestLap, fastestLapTime, fastestLapSpeed, circuitId, raceId, year, name)
verstappen_performance %>% knitr::kable()
driverId code forename surname positionText points laps time fastestLap fastestLapTime fastestLapSpeed circuitId raceId year name
830 VER Max Verstappen 2 19.0 57 +25.743 57 1:23.196 232.799 78 1051 2021 Qatar Grand Prix
830 VER Max Verstappen 2 18.0 56 +0.745 41 1:33.228 208.984 3 1052 2021 Bahrain Grand Prix
830 VER Max Verstappen 1 25.0 63 2:02:34.598 60 1:17.524 227.960 21 1053 2021 Emilia Romagna Grand Prix
830 VER Max Verstappen 2 18.0 66 +29.148 62 1:20.695 207.581 75 1054 2021 Portuguese Grand Prix
830 VER Max Verstappen 2 19.0 66 +15.841 62 1:18.149 215.357 4 1055 2021 Spanish Grand Prix
830 VER Max Verstappen 1 25.0 78 1:38:56.820 58 1:14.649 160.929 6 1056 2021 Monaco Grand Prix
830 VER Max Verstappen R 0.0 45 44 1:44.481 206.839 73 1057 2021 Azerbaijan Grand Prix
830 VER Max Verstappen 1 25.0 71 1:22:18.925 68 1:08.017 228.542 70 1058 2021 Styrian Grand Prix
830 VER Max Verstappen 1 26.0 53 1:27:25.770 35 1:36.404 218.156 34 1059 2021 French Grand Prix
830 VER Max Verstappen 1 26.0 71 1:23:54.543 62 1:06.200 234.815 70 1060 2021 Austrian Grand Prix
830 VER Max Verstappen R 0.0 0 9 1061 2021 British Grand Prix
830 VER Max Verstappen 9 2.0 70 +1:20.244 43 1:20.945 194.843 11 1062 2021 Hungarian Grand Prix
830 VER Max Verstappen 1 12.5 1 3:27.071 13 1063 2021 Belgian Grand Prix
830 VER Max Verstappen 1 25.0 72 1:30:05.395 60 1:13.275 209.244 39 1064 2021 Dutch Grand Prix
830 VER Max Verstappen R 0.0 25 25 1:25.173 244.852 14 1065 2021 Italian Grand Prix
830 VER Max Verstappen 2 18.0 53 +53.271 28 1:38.396 213.959 71 1066 2021 Russian Grand Prix
830 VER Max Verstappen 2 18.0 58 +14.584 53 1:32.759 207.169 5 1067 2021 Turkish Grand Prix
830 VER Max Verstappen 1 25.0 56 1:34:36.552 52 1:39.096 200.278 69 1069 2021 United States Grand Prix
830 VER Max Verstappen 1 25.0 71 1:38:39.086 52 1:18.999 196.134 32 1070 2021 Mexico City Grand Prix
830 VER Max Verstappen 2 18.0 71 +10.496 47 1:12.486 214.005 18 1071 2021 São Paulo Grand Prix
830 VER Max Verstappen 2 18.0 50 +11.825 35 1:31.488 242.943 77 1072 2021 Saudi Arabian Grand Prix
830 VER Max Verstappen 1 26.0 58 1:30:17.345 39 1:26.103 220.800 24 1073 2021 Abu Dhabi Grand Prix
verstappen_performance  %>% 
  group_by(positionText) %>% 
  summarise(count=n()) %>% 
  rename(Position = positionText , Count = count) %>% 
  arrange(as.numeric(Position)) %>% 
  knitr::kable()
Position Count
1 10
2 8
9 1
R 3

From the above table we can see that Verstappen won 10 races in 1st position and 8 races in 2nd position. That is better performance than Lewis Hamilton. However he lost a lot of points because his car retired in 3 races.


hamilton_hist <- ggplot(hamilton_performance, aes(x=as.numeric(positionText))) + geom_histogram(bins=30,binwidth = 0.5) + labs(x="Position",title="Hamilton's positions in 2021 season") +
  scale_y_continuous(limits = c(0, 10), breaks = seq(0, 10, 1)) +
  scale_x_continuous(limits = c(0, 20), breaks = seq(0, 20, 2))

verstappen_hist <- ggplot(verstappen_performance, aes(x=as.numeric(positionText))) + geom_histogram(bins=30,binwidth = 0.5) +
  labs(x="Position",title="Verstappen's positions in 2021 season") +
  scale_y_continuous(limits = c(0, 10), breaks = seq(0, 10, 1)) +
   scale_x_continuous(limits = c(0, 20), breaks = seq(0, 20, 2))
       
combined_plots <- hamilton_hist+verstappen_hist
combined_plots

Cleaning the drivers datafile and selecting required columns:

new_drivers <- drivers %>% 
  select(driverId, forename, surname, code, number)
new_drivers

Selecting the constructors datafile and selecting just the names and Id.

new_constructors <- constructors %>% 
  select(constructorId, name)
new_constructors

Merging all the data variables into one. abudhabi_grand_prix has the data of all the races and drivers that have ever raced on this track.

abudhabi_grand_prix <- abudhabi_circuit
abudhabi_grand_prix <-left_join(abudhabi_grand_prix, new_races, by="circuitId")
abudhabi_grand_prix <-left_join(abudhabi_grand_prix, new_results, by="raceId")
abudhabi_grand_prix <-left_join(abudhabi_grand_prix, new_drivers, by="driverId")
abudhabi_grand_prix <-left_join(abudhabi_grand_prix, new_constructors, by="constructorId")
abudhabi_grand_prix

Verstappen’s career started in 2014 so we will only consider those races that took place between 2014 and 2022. Assigning this to a new variable abudhabi_grand_prix_clean.

abudhabi_grand_prix_clean <- abudhabi_grand_prix %>% 
  select(year,positionText,points,fastestLap, fastestLapTime, forename, surname, name.y, name, driverId) %>% 
  filter(year > 2014 & year < 2022)
abudhabi_grand_prix_clean

Let’s see how many times each driver has won on this track.

abudhabi_grand_prix_clean %>% 
  filter(positionText == 1, surname=="Hamilton" | surname == "Verstappen") %>% 
  group_by(surname) %>% 
  summarise(wins = sum(as.numeric(positionText))) %>% knitr::kable()
surname wins
Hamilton 3
Verstappen 2

Historically Hamilton has won 3 times and Verstappen has won twice on this race track.


Now lets calculate how many points they have scored overall on this track.

abudhabi_grand_prix_clean %>% 
  filter(positionText == 1, surname=="Hamilton" | surname == "Verstappen") %>% 
  group_by(surname) %>% 
  summarise(points_scored = sum(points)) %>% knitr::kable()
surname points_scored
Hamilton 76
Verstappen 51

As expected, Hamilton has scored more points because of more wins. But considering that Verstappen’s carrer only started in 2014 he has scored 2 wins already.


Let’s also determine the constructor points scored on this race track.

abudhabi_grand_prix_clean %>% 
  filter(name == "Red Bull" | name == "Mercedes") %>% 
  group_by(name) %>% 
  summarise(total_points = sum(points)) %>% knitr::kable()
name total_points
Mercedes 261
Red Bull 157

Mercedes has a proven track record of consistently scoring on this track. It has scored 261 points and Red Bull has scored 157 in a period of 7 years.

These are the race statics of just Hamilton and Verstappen in the Abu Dhabi Grand Prix 2021

abudhabi_gp_21 <- abudhabi_grand_prix_clean %>% 
  filter(year == 2021, surname == "Hamilton" | surname == "Verstappen")
abudhabi_gp_21 %>% knitr::kable()
year positionText points fastestLap fastestLapTime forename surname name.y name driverId
2021 1 26 39 1:26.103 Max Verstappen Abu Dhabi Grand Prix Red Bull 830
2021 2 18 43 1:26.615 Lewis Hamilton Abu Dhabi Grand Prix Mercedes 1

Interestingly Max Verstappen has the fastest lap time of 1:26:103 that about 500 millisecond faster than Lewis Hamilton.

This is the detailed lap statistics of the 2021 Abu Dhabi Grand Prix (raceId:1073) for Hamilton (driverId:1) and Verstappen (driverId:830)

ab_gp_laptimes <- lap_times %>% 
  filter(driverId == 1 | driverId == 830, raceId==1073)
ab_gp_laptimes
ab_gp_laptimes <- left_join(ab_gp_laptimes, new_drivers, by ="driverId")

Let’s calculate the average lap time for both the drivers

ab_gp_laptimes %>% 
  left_join(abudhabi_gp_21, by="driverId") %>% 
  group_by(surname.x) %>% 
  summarise(Average_laptime_in_seconds = mean(milliseconds)/1000) %>% knitr::kable()
surname.x Average_laptime_in_seconds
Hamilton 93.4414
Verstappen 93.4025

Here we can notice that Verstappen is about 400 millisecond faster than Hamilton on an average.

Let’s consider the qualifying results to determine which driver has a better grasp on the track.

new_qualifying <- left_join(qualifying, new_races, by = "raceId")
new_qualifying <- left_join(new_qualifying, new_drivers, by = "driverId")
new_qualifying
qualifying_plot <- new_qualifying %>% 
  filter(name == "Abu Dhabi Grand Prix", driverId == 1 | driverId == 830, year>2014 & year<2022) %>% 
  ggplot(aes(x=year,y=q3, color=surname)) + geom_point(size=8) +
  scale_x_continuous(limits = c(2015, 2021), breaks = seq(2015, 2021, 1)) +
  geom_text(aes(label = q3), color = "black", fontface = "bold", vjust = 1.5,
            hjust = -0.2, size = 3) +
  labs(x="Year", y = "Q3 Times",title = "Qualifying times for Hamilton and Verstappen", subtitle = "Q3 - 2015 to 2020")
ggplotly(qualifying_plot)

Here we can see that over the years, both drivers have improved their Q3 times. But it is clearly noticeable that Verstappen actually was faster in the year 2020 and 2021.


4 Conclusions

From all the findings and obvervations above it is fair to conclude that Verstappen has a consistently amazing performance over the season. Moreover, his performance on the Abu Dhabi circuit was also commendable with average lap times less than Hamilton’s which makes him faster than him and better performance in Q3 with faster laptimes.

There were controversies regarding the race director allowing outlapped cars to overtake the safety car which was certainly an advantage to Max Verstappen. But considering Max’s performance it seems likely that he would have overtakes those cars when the race restarted anyway and there was high chance that he still would have won the race.

So it is evident that it is unfair to say that Hamilton would have won if the cars weren’t allowed to overtake the safety car.4 Verstappen’s performance has been on point and he seems a bit faster than Hamilton overall. Hence it is likely that Verstappen would have won the Abu Dhabi Grand Prix 2021 and subsequently his first World Championship title!

5 References


  1. Formula 1 Wiki↩︎

  2. Abu Dhabi Grand Prix 2021 Wiki↩︎

  3. Ergast Developer API↩︎

  4. BBC article Formula One: Safety car rules tweaked by FIA in wake of controversial 2021 title decider↩︎